Harden driver IPC and registry client robustness#33
Merged
Conversation
driver/ipc.go: drop late IPC replies after timeout instead of mis-correlating them with the next request. Each sendAndWait now registers a private, single-use waiter slot; readLoop delivers only to the active waiter and only when the reply cmd matches (cmdError always allowed). On timeout/disconnect the slot is abandoned, so a reply for an already-timed-out request finds no matching waiter and is dropped rather than handed to an unrelated caller. Full cross-process correctness still needs daemon-side request IDs (coordinated change + version bump), documented as a TODO. driver/driver.go: DialAddr now applies a 30s default timeout via DialAddrTimeout so a non-responsive daemon can't block forever; Listen and Broadcast get the same bound. jsonRPC paths (incl. WaitForTrust, which blocks in the daemon by design) keep the unbounded path. Document SendTo as fire-and-forget (nil means local IPC enqueue, not delivery). driver/conn.go: serialize Read with a dedicated mutex so recvBuf can't be corrupted by concurrent readers; implement SetWriteDeadline (was a silent no-op) backed by a writeDeadline checked in Write; SetDeadline now sets both read and write deadlines; document Write success as local IPC enqueue, not remote delivery. registry/client/client.go: bound every initial TCP/TLS dial with a 5s timeout (matching the reconnect paths) so registry ops can't hang on an unreachable host. Tests: add TestLateReplyAfterTimeoutNotMisdelivered (verified to fail without the waiter cmd-match guard) and a real SetWriteDeadline behavior test; update fixtures for the removed shared pending buffer.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Robustness/correctness hardening for the
driverIPC layer and theregistry/clientdialer. No wire-protocol or behavior changes for existing callers beyond added safety bounds.Fixes
1. Stale IPC reply mis-correlation (client-side mitigation) —
driver/ipc.goThe IPC wire protocol has no request IDs and the daemon dispatches requests concurrently, so a reply that arrives after its request timed out could be consumed by a later, unrelated request (wrong
conn_id/result). Previously all replies funneled into one sharedpendingbuffer that the next caller drained.Now each
sendAndWaitregisters a private, single-use waiter slot.readLoopdelivers a reply only to the currently active waiter, and only when the replycmdmatches what that waiter expects (cmdErroris always accepted). On timeout/disconnect the waiter is abandoned, so:cmdmismatch (the dangerous case — e.g. a dial reply landing on a health request).Deferred: full cross-process ordering correctness needs daemon-side request IDs echoed in every reply (a coordinated daemon change + version bump). That is intentionally not done here and is marked
TODO(PILOT). This client-side scheme only guarantees a late reply is not mis-delivered; it cannot recover the abandoned request's actual result, and two same-cmdreplies remain indistinguishable without request IDs.Regression test:
TestLateReplyAfterTimeoutNotMisdelivered— confirmed to fail without the waiter cmd-match guard ("unexpected reply 0x0E") and pass with it.2.
DialAddrcould hang forever —driver/driver.goDialAddrusedsendAndWait(no timeout). It now delegates toDialAddrTimeoutwith a 30sdefaultDialTimeout. The other boundedsendAndWaitcallers —ListenandBroadcast— get the same bound.jsonRPC-based control ops keep the unbounded path becauseWaitForTrustlegitimately blocks in the daemon.3.
Connnet.Conn conformance —driver/conn.gorecvBufis now guarded by a dedicatedreadMuso concurrent readers can't corrupt it (single-reader contract documented).SetWriteDeadlinewas a silent no-op; it is now implemented, backed by awriteDeadlinechecked before/between Write chunks.SetDeadlinesets both read and write deadlines.Writegodoc clarified: success means the bytes were enqueued to the local daemon over IPC, not transmitted/acknowledged by the peer.4. Registry client dial timeouts —
registry/client/client.goEvery initial
net.Dial/tls.Dial(Dial, DialPool, DialTLS, DialTLSPool, initPool, DialTLSPinned) now uses a 5sdialTimeout(matching the existing reconnect paths) so startup/registry ops can't hang on an unreachable host.5. Datagram send semantics —
driver/driver.goSendTogodoc made explicit: fire-and-forget; a nil return only means the frame was enqueued to the local daemon, not delivered. Semantics unchanged.Validation
GOWORK=off go build ./...— cleanGOWORK=off go vet ./...— cleangofmt— all changed files cleanGOWORK=off go test -race -parallel 4 ./...— all 15 packages passgosec ./badgeverify/...— 0 issues (CI scope)govulncheck ./...— no vulnerabilitiesgitleaks detect— no leaksDeferred work
Full request-ID protocol for IPC (daemon + driver, version-bumped) to make cross-process reply ordering provably correct. Tracked via
TODO(PILOT)indriver/ipc.go.